The effectiveness of position- and composition-specific gap costs for protein similarity searches
نویسندگان
چکیده
MOTIVATION The flexibility in gap cost enjoyed by hidden Markov models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. RESULTS We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We observed no effect of composition-specific gap costs on retrieval performance. These results suggest possible improvements to the PSI-BLAST protein database search program. AVAILABILITY The scripts for performing evaluations are available upon request from the authors.
منابع مشابه
First Record of HAdV-D20 Among Keratoconjunctivitis Patients in Iraq
Background: Human Adenovirus species D (HAdV-D) was common human viral pathogen especially in eye infection, consists of several types of which HAdV-D8, -D19 and –D37 were common in eye infection. This study includes detection of HAdV-D types implicated in conjunctivitis based on L2 (Penton protein) gene similarity. Methods: Conjunctival swabs were collected from Keratoconjunctivitis patient...
متن کاملPhysicochemical Position-Dependent Properties in the Protein Secondary Structures
Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designing enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a ...
متن کاملRapid similarity search of proteins using alignments of domain arrangements
MOTIVATION Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capt...
متن کاملGenetic variation of some Iranian Hyoscyamus Landraces based on seed storage protein
The genus Hyoscyamus belongs to the tribe Hyoscyameae Miers of Solanaceae family. Variation in protein bands elaborates the relationship among the collections from various geographical regions. In this study the seed storage protein diversity of 19 accessions of Hyoscyamus (H. niger, H. reticulatus and H. pusillus) from West Azerbaijan (Iran) was investigate...
متن کاملDetermining specific species and the species contribution in the similarity between soil seed bank and standing vegetation Case study: Lazour rangeland- Firouzkooh
Determining the potential of soil seed bank and its specific species is important for conservation goals and vegetation restoration of rangelands. In this study, the characteristics of soil seed bank and standing vegetation in Lazour mountain rangeland were investigated in order to estimate the rehabilitation ability of the study area in case of possible disturbances. In order to determine the ...
متن کامل